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Binary fluorescence time series obtained from single-molecule imaging experiments can be used to infer protein 
binding kinetics, in particular, association and dissociation rate constants from waiting time statistics of 
fluorescence intensity changes. In many cases, rate constants inferred from fluorescence time series exhibit 
nonintuitive dependence on ligand concentration. Here we examine several possible mechanistic and technical 
origins that may induce ligand dependence of rate constants. Using aggregated Markov models, we show 
under the condition of detailed balance that non-fluorescent bindings and missed events due to transient 
interactions, instead of conformation fluctuations, may underly the dependence of waiting times and thus 
apparent rate constants on ligand concentrations. In general, waiting times are rational functions of ligand 
concentration. The shape of concentration dependence is qualitatively affected by the number of binding 
sites in the single molecule and is quantitatively tuned by model parameters. We also show that ligand 
dependence can be caused by non-equilibrium conditions which result in violations of detailed balance and 
require an energy source. As to a different but significant mechanism, we examine the effect of ambient buffers 
that can substantially reduce the effective concentration of ligands that interact with the single molecules. To 
demonstrate the effects by these mechanisms, we applied our results to analyze the concentration dependence 
in a single-molecule experiment EGFR binding to fluorophore-labeled adaptor protein Grb2 by Morimatsu et 
ali 



I. INTRODUCTION 

Single-molecule fluorescence techniques measure real- 
time kinetics of chemical reactions, such as RNA fold- 
ing^, enzymatic reactions^, protein-protein, protein- 
oligonucleotides and protein-DNA bindings^ - — . An im- 
portant application of single-molecule fluorescence tech- 
niques is to probe the binding kinetics between interac- 
tion partners and to infer binding rate constants from 
the fluorescence time series. In contrast to conventional 
ensemble-averaged measurements, single-molecule fluo- 
rescence techniques directly observes the binding stochas- 
ticity at the molecular level, allowing investigations of 
conformation fluctuations of individual molecules under 
various conditions. 

Single-molecule fluorescence time series often exhibit 
transitions alternating between two observable states: 
fluorescent (on) and non-fluorescent (off) states. One 
can summarize information in such "binary" time series 
using one-dimensional waiting time distributions and ki- 
netic rate constants for association (fc on ) and dissociation 
(feoff) can therefore be derived from the mean waiting 
times. Common experimental procedures usually involve 
measuring single-molecule binding fluorescence time se- 
ries under varied ligand concentrations. 

These waiting time distributions for protein binding 
in many cases are fit by sums of multiple exponentials 



or empirically by stretched exponentials*!"*, suggesting 
that conformation of the single molecule fluctuates dur- 
ing the course of interaction. Therefore, transitions be- 
tween the two macroscopic states ("on" and "off") pro- 
ceed through diverse conformation channels connecting 
microscopic states which cannot be directly distinguished 
by fluorescence time series. Such temporal variants in 
protein conformation are referred to as dynamic disorder 
if conformations fluctuate on a time scale comparable to 
that of protein bindings'**. 

To analyze a binary fluorescence time series, the kinet- 
ics of binding between a single molecule and its ligand is 
usually described by the simple phenomenological two- 
state model: 



k m [L] 
off ^ on 

fcrff 



(1) 



where state "off" indicates that the molecule is free and 
state "on" indicates that the molecule is bound to a lig- 
and, and the transition rate from "off" to "on" is propor- 
tional to the ligand concentration [L] due to the law of 
mass action. One can obtain the apparent rate constants 
fc on and feoff from the mean waiting times, r on and r Q s, 
respectively as: 
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where r Q ff is the mean waiting time for association (or, 
the mean dwell time at the macroscopic "off" state) and 
Ton is the mean waiting time for dissociation (or, the 
mean dwell time at the macroscopic "on" state). The 
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apparent dissociation constant is then given as K4 = 
k Q g/k on . In this two-state model, when forward and 
backward transitions are Markovian, in other words, if 
the two-state model is biochemically elementary, the rate 
constants fe on and k Q g are independent of [L\. How- 
ever, measured single-molecule kinetics may potentially 
be affected by various mechanistic and technical fac- 
tors including protein concentrations, protein conforma- 
tions and cellular environments, etc. These factors may 
cause significant ligand concentration dependence of ki- 
netic rate constants inferred by the mean waiting times 
as calculated by Eq.[2]. 



In particular, transitions between observed "on" and 
"off" states can be complex due to conformation fluc- 
tuations, such that the two-state minimalistic model of 
Eq. pQ is non-Markovian and thus is not adequate to de- 
scribe the true transition mechanism. For such cases, 
mechanistic models are required to analyze the data. 
Here, we examine several possible mechanisms that might 
cause ligand dependence of the kinetic rate constants fc on 
and fc ff. Conformation fluctuations in single molecules 
are described by aggregated Markov models as usually 
being treated in analysis of single-ion channel record- 
ings^—. In addition, we show that non-equilibrium 
models that violate the detailed balance constraint can 
also generate strong ligand concentration dependence of 
rate constants. 



We further examine the ligand dependence caused by 
two technical sources that are not directly related to the 
binding mechanism: (1) missed events, and (2) back- 
ground buffer. The former is concerned with transient 
events that are not captured by observations, which 
causes overestimation and distortion of ligand depen- 
dence in waiting times. We show that missed events have 
an effect on waiting times similar to that by models of 
single protein binding site with non-fluorescent binding. 
The latter mechanism is concerned with ligand interac- 
tions with ambient buffer molecules, which causes a sub- 
stantial reduction in the effective concentration of ligands 
that bind to the single molecules. Such a background 
buffering mechanism if unaccounted will cause a strong 
dependence of the apparent association rate constant on 
the total ligand concentration [L]. 



We apply our models to analyze an in vitro experi- 
ment of single epidermal growth factor receptor (EGFR) 
binding to adaptor protein Grb2 by Morimatsu et al.s~£. 
The experiment showed that waiting time distributions 
had multiple exponential decay, suggesting that EGFR 
molecule may have conformational changes on time scales 
comparable to Grb2 binding. Moreover, the apparent 
association rate constant fc on had a counter-intuitive de- 
pendence on Grb2 concentrations. 



II. THEORY 

A. Aggregated Markov models 

The theory of aggregated Markov models was devel- 
oped for analyzing ion channel gating mechanisms from 
single ion channel recording s 12 ' 13 . Similar Markov mod- 
els have been developed more recently in analyzing times 
series obtained from single-molecule fluorescence experi- 
ments^. An aggregated Markov model is a special case of 
hidden Markov chain, in which states in a particular cate- 
gory (an aggregate) correspond to a signal of an identical 
fluorescence intensity. In a binary single-molecule model, 
the system fluoresces in states in the "on" aggregate and 
does not fluoresce in states in the "off" aggregate. 

An aggregated Markov model for a single molecule can 
be fully characterized in terms of the "generator matrix" , 
Q, which has an off-diagonal structure that encodes the 
reaction scheme. Entry q^ is the transition rate from 
state i to state j. The diagonal entries are defined as qu — 
— Y]j qij. In a matrix form, one can write Qu = 0, where 
u is the right null vector of all ones. The steady-state 
occupancies w is the normalized left null vector of Q, 
i.e., wQ = and ^ uii = 1. For systems with aggregates 
"on" and "off" , Q can be organized and partitioned as 



Q 



Qoo Qo 
Qco Qc 



(3) 



where diagonal blocks contain intra-aggregate transi- 
tion rates and off-diagonal blocks contain inter-aggregate 
transition rates, o and c in Eq. [3] denote "on" and "off" 
aggregates, respectively. We can correspondingly parti- 
tion the null vectors: w — [w w c ] and u — [u u c ] T . The 
mean waiting times are given as 
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(4) 



each of which is the ratio of steady-state aggregate oc- 
cupancy, P on = w u or P Q ff = w c u c , to the total inter- 
aggregate probability flux (e.g., J oc = w Q 0C u c ). The 
inter-aggregate probability fluxes are balanced at the 
steady state, i.e., J oc = J co . 



B. Detailed balance 

The principle of microscopic reversibility (or the law of 
detailed balance)^ states that at thermodynamic equi- 
librium for any reversible transition between two neigh- 
boring states i and j the probability flux from state i to 
state j is balanced by that from j to i, i.e., Wiqij = w^qji. 
In general, the occupancy Wi has a complex relationship 
with the ligand concentration [L]. However, under the 
detailed balance, one can derive a relative occupancy w, 
in which each entry has a monomial dependence on [L]. 
This treatment eases analysis of ligand dependence of fc on 
and fc ff. By the law of mass action, the rate of a ligand 
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FIG. 1. Template I consists of an "on" aggregate in which the 
receptor has a ligand bound, and an "off' aggregate with no 
ligand bound. Models constructed from Template I may have 
any number of "on" and "off' states with any connectivity. 
Rates for transitions from "off' states (empty circles) to "on" 
states (filled circles) are proportional to the ligand concentra- 
tion [L] . All reversible rates for transitions from "on" states to 
"off' states are constant, and all intra-aggregate transitions 
are spontaneous and have constant transition rates. Tem- 
plate II extends Template I with unresolved ligand-bound 
"off' aggregate (gray circles). Template III describes a single 
molecule with two binding sites. One site is fluorescent (dark 
dot) when binding to a ligand, whereas the other is nonfluo- 
rescent (gray dot). Again there can be any number of states 
in each of the four aggregates of Template III. 



binding is proportional to [L] , whereas the reversible dis- 
sociation rate is a constant. With the detailed balance 
condition, for two neighboring states i and j, we have 



III; 



in 



K l3 [L] b 



(5) 



where b = — 1 if the state transition from i to j is induced 
by a ligand binding, b = 1 if the state transition from j 
to i is induced by a ligand binding, and b = if both 
forward and backward transitions do not involve ligand 
binding. The coefficient K%j is a constant. Designate a 
reference state r at which the molecule does not bind to 
a ligand. We define the relative occupancy as the ratio 
Wi = Wi/w r , and note that state i and r are connected by 
a path involving one or more transitions. Applying Eq.[5] 
successively along a path in the reaction scheme from 
state i to state r, we can show = ki[L] ni , where the 
non-negative integer m is the number of ligands bound 
at state i and fcj is the product of equilibrium constants 
for all the reversible reactions along the path from state 
% to state r. The numerical value of fcj depends on the 
choice of the reference state r but does not depend on 
the choice of a path that connects state i and r due to 
detailed balance. The mean waiting times are now given 
as 
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From the above derivation, it is evident that in general 
both T on and t q are ligand dependent as rational func- 
tions of [L]. 



C. Model templates 

Instead of using models with specific mechanisms, we 
discuss the apparent rate constants under different model 
classes depicted in Fig. [T]by "template" reaction schemes. 
A template consists of several different aggregates. Tem- 
plate I has two aggregates, an "on" aggregate which 
has ligand bound and an "off" aggregate with no ligand 
bound. Models constructed from Templates I and II can 
have any number of on and off states and any connectiv- 
ity. For Template III, the connectivity is arbitrary except 
that no state in the unliganded aggregate can be directly 
linked to states in the doubly liganded aggregate. We 
consider models with two experimentally-distinguishable 
fluorescent ( "on" ) and non-fluorescent ( "off" ) aggregates 
with three categories of states for the single molecule; (i) 
a dark state ("off") with no ligand bound, (ii) a ligand- 
bound dark state ("off"), and (iii) a bright state ("on") 
which has ligand bound. The advantage of this approach 
is that our conclusions do not depend on model details 
such as the number of states and the connectivity be- 
tween states but only on key biochemical aspects such 
as the number of binding sites and whether the bind- 
ing protein fluoresces while associating with the recep- 
tor. Although a waiting time distribution that is a sum 
of multiple exponents depends on the number of states, 
the ligand dependence of the apparent association rate 
does not. 



III. RESULTS 

A. Conformation fluctuations do not cause 
concentration-dependence of rate constants 

We first consider a single molecule that has a sin- 
gle ligand binding site. We assume that every ligand 
binding event is experimentally observed, which there- 
fore switches the molecule from an "off" state to an "on" 
state. Each ligand departure switches the molecule from 
an "on" state to an "off" state. Intra-aggregate state 
transitions do not involve ligand arrival or departure. 
Clearly, any such kinetic model can be constructed from 
a two-state base scheme as shown in Template I (Fig. [1} . 
Conformation fluctuations could be modeled by extend- 
ing the base scheme with multiple "on" and "off" states 
with an arbitrary connections between states. We can 
write w = k [L] and w c = k c , where k a and k c are con- 
stant vectors. The mean waiting times are given as 



koQ oc^c 



Tofi 



[L]k Q Q 

O 



(7) 



Since Q oc only contains ligand dissociation rate constants 
T on is a constant and t s is proportional to the inverse 
of [L]. Thus, we have shown that conformation fluctu- 
ations do not generate ligand concentration dependence 
of the apparent rate constants, fc on and k g as defined 
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in Eq.[2]. This result holds for a reaction scheme with 
arbitrary number of states and arbitrary connections be- 
tween states as extended from Template I, as long as each 
binding event was directly observed by the experiment. 
Notice that in such case k on and k a s calculated from the 
mean waiting times are in fact identical to those obtained 
in ensemble-averaged measurements. 

B. Effect of non-fluorescent binding 

In some cases, ligand binding may not be resolved ex- 
perimentally and become unnoticed, which could cause 
ligand dependence as we show below. We consider a 
model that incorporates non-fluorescent ligand binding 
to the single molecule. A model (Template II, Fig. [T]) 
of a molecule that has a single binding site allows a 
"dark" conformation (a "c" state) in which a bound lig- 
and does not fluoresce (such as due to transient inter- 
actions). From Eq.[S], models from Template II give an 
[L] -independent r on and a t s that has a linear depen- 
dence on [L\. In particular, one special acyclic scheme 
from Template II describes a following two-step binding 
model. 

k i[ L ] k 2 

off ^ dark ^ on , (8) 

where the first step from the "off" state to the non- 
fluorescent "dark" state represents a ligand-receptor con- 
tact due to ligand diffusion, and the second step to the 
"on" state models a reaction-limited transition. An alter- 
native mechanism is the 2D lateral diffusion of a ligand 
on the cell surface to search for a binding molecule after 
a 3D diffusion in the bulk solution onto the cell surface, 
which may also contribute complications in analyzing the 
mean waiting times. From the model in Eq.[5], the ap- 
parent association rate constant is given as: 

_ hk 2 

where k\ can be considered as the diffusion-limited rate 
constant. Note that Eq.[9] contains only two free param- 
eters. 

Morimatsu et aL— hypothesized that frequent and 
short-term Grb2-EGFR interactions that escaped the in- 
strumental resolution may induce conformation memory 
in EGFR molecules and thus account for the concen- 
tration dependence of the mean off-time. In their ex- 
periments, single EGFR molecules were monitored by 
a total internal reflection fluorescence microscope (TIR- 
FM) for binding and dissociation with fluorophore (Cy3)- 
labeled adaptor protein Grb2 that reversibly binds to 
specific phospho-tyrosine residues on EGFR. Statistics 
of the fluorescence time series showed that k on decreased 
from 220 ^M -1 s -1 to 7.1 ^M -1 s -1 when Grb2 concen- 
tration increased from 0.1 to 100 nM, whereas the ap- 
parent dissociation rate constant k g was found to be 
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FIG. 2. Dependence of the apparent association constant on 
the Grb2 concentration. Results are produced by fitting Tem- 
plates II (dashed curve), Template III (dotted curve) and the 
buffering model (solid curve) to the experimental data (cir- 
cles) measured by Morimatsu et al.— . 



constant about 3.4 s . To apply this model to the 
data reported by Morimatsu et a&, we assume that 
the second-order association rate constant is diffusion- 
limited, k\ = AttDs = 1.51 x 10 3 ^M _1 s _1 , where the 
diffusion coefficient D = 100//m 2 s -1 and the spherical 
contact radius s = 2 nm as in Ref.— . The best fit to the 
data gives fc_i = 4.54s -1 and k 2 = 0.42s -1 . k- 2 is mea- 
sured by the mean on time: fc_2 = l/r on = 3.4s -1 . As 
shown by the fitting quality in Fig. [2j even though fc on 
qualitatively decreases with increasing [L], this scheme 
is yet to fully quantify the observed dependence on [L], 
suggesting that an alternative mechanism might better 
account for the intriguing results reported by Morimatsu 
et al. 1 . 



C. Molecule with multiple binding sites 

A single molecule with multiple binding sites to a lig- 
and may potentially cause a ligand dependence of the 
apparent rate constants that have different forms from a 
molecule with a single binding site. One cause of ligand 
dependence is that transient ligand binding escapes ob- 
servation as discussed below in the missed events section 
(Section IlIIEp . Here, we assume the existence of a site 
which does not fluoresce for an unknown reason which 
does result in different ligand dependence of T a than the 
above model with a single non-fluorescent binding site. 

With Template III, we consider a molecule that has two 
binding sites as shown in Fig.[T] Although each reversible 
reaction in Template III involves ligand binding, only one 
site is monitored for binding. Models constructed from 
Template III have four state classes: (i) both sites un- 
bound ("off"), (ii) ligand bound to the dark site ("off"), 
(iii) ligand bound to the bright site ("on"), (iv) ligand 
bound to both sites ( "on" ) . Calculate the relative occu- 
pancy based on the rules described in the previous section 
(Sec. Ill Bp . Then, by Eq. [BJ, the mean on-time is given 
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by 



fc-2 



k 22 [L] 



fc-llfe-22 + k-uk22[L] 



(10) 



which ranges from l/fc_n to l/fc_ 12 as [L] increases from 
zero to infinity. When the ligand concentration is small 
the kinetics of the model is biased to the transitions be- 
tween the upper two states in Template III, whereas the 
kinetics is shifted to the transitions between the lower 
two states when the ligand concentration becomes large. 
If the off rates fc_n and fc_i 2 are nearly identical for 
fluorescent ligand to dissociate from the single molecule 
bound or unbound to the non-fluorescent ligand, r on is 
independent of [L] and an experiment in this case can 
only resolve the off rate for the fluorescent binding site. 
The mean off-time is given by: 



Toff 



fc_2lfc-llfc- 22 + fc-llfc_ 22 fc 2 l[£] 
fc_2lfc-llfc_22fcll[i] + k-i 2 k-2lknk 2 2[L] 



(11) 



Using the above equation and the condition of detailed 
balance, the apparent association rate constant is given 
as 



k n +k 12 ^[L} 
l + feM 



(12) 



As [L] increases from zero to infinity, k on is bounded 
between kn and k± 2l respectively. If ligand binding to 
the fluorescent site is independent of binding to the non- 
fluorescent site (see Fig. [IJ i.e. fcn = fc 12 , then fc on does 
not have ligand dependence. 

We use Eq. [T2] to fit the data from Morimatsu et ali 
In fact, more than one sites on EGFR molecule includ- 
ing phospho-tyrosine sites Y1068 and Y1086 were identi- 
fied as Grb2 binding sitesi&. As shown in Fig. (dotted 
curve), models from Template III with the best fit are 
able to generate a closer agreement to the data. The fit- 
ting gives fen = 2.16xlOVkT 1 s~\ k 12 = h.M^M^s- 1 
and k 2 \/k- 2 \ = 2.02 x lQ^fiM^ 1 . Since the off-rates 
= /c_i2 = 3.4s -1 , these results indicate a near two 
orders of magnitude reduction in ligand affinity to the 
second ligand site after the first ligand binding, suggest- 
ing a negative cooperativity of the two binding sites. The 
parameters also indicate a high affinity non-fluorescent 
binding with a dissociation constant k_ 2 \jk 2 \ w 0.5nM. 



D. Effect of detailed balance violation 

The above analysis breaks down when the assumption 
that a system obeys detailed balance becomes invalid, 
which refers to the situation that a system reaches a non- 
equilibrium steady state because of reactions driven by 
an implicit external energy source such as a sustained 
chemical or electrical potential. The results of Eq. [B] are 
not applicable when detailed balance does not hold. The 
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FIG. 3. A three-state cyclic model with two "off" states and 
one "on" state. Transitions from "off" to "on" is induced by 
ligand binding with transition rates proportional to the ligand 
concentration [L]. 



steady-state occupancy w must be obtained alternatively 
(see Appendix). 

Here, we first use a minimalistic three-state model 
(Fig. [3]) that contains a reaction loop to show that viola- 
tion of detailed balance causes ligand dependence of rate 
constants. The model has two "off" states without ligand 
binding and one "on" state bound to a ligand. To iso- 
late the effect of violation of detailed balance, the model 
does not invoke any non-fluorescent "dark" state. One 
can derive the mean waiting times as (see Appendix): 



1 



Toff 



k-3i + k 32 

(fcia + fc 2 i)(fc 3 i + fc 32 ) + (fc 23 fc 3 i + ki 3 k 32 )[L] 
(*8i + k 32 )(k 12 k 23 + k 13 (k 21 + k 23 [L]))[L] 



(13) 
(14) 



We note that T on is constant regardless the condition of 
detailed balance. This is a special case for this particular 
model. In general, violation of detailed balance causes 
ligand dependence in both mean "on" and "off" times. 
The association rate constant k on has the same structure 
as Eq. [T^] from Template III and would achieve identical 
best fit to a time series data set as a model constructed 
from Template III. 

Under the condition of detailed balance (fci 3 fc 32 fc2i — 
k 3 iki 2 k 2 ±), the mean off-time r g is reduced to: 
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k\ 3 k. 



3 '' 32 



k\ 3 k 23 (k 3 i + k 32 )[L] 



(15) 



which is inversely proportional to [L] and is identical to 
the result obtained using Eq.[6]. 

For an arbitrary reaction scheme, we show (see Ap- 
pendix for detailed derivation using a graph-theoretic 
method, Fig. [3J that the apparent rate constants are 
given as rational functions of [L\. 



k — 



J 



Pon[L] 



feoff = , 

p 

1 on 



(16) 



where P n and P ff are unnormalized steady-state occu- 
pancies of "on" and "off" aggregates, respectively. J is 
an unnormalized inter-aggregate probability flux. These 
three terms are all polynomials of ligand concentration 
L. Note that for both fc Q n and fe ff the denominator and 
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nominator polynomials have corresponding terms of [L] 
to same powers (See Appendix for mathematical deriva- 
tion). The exact form of the polynomials is specific to 
a model topology and the coefficients of the polynomi- 
als are in terms of model parameters. If parameters in 
a model satisfies detailed balance, in the above equa- 
tions the ligand-dependent terms factor out from both 
the nominator and denominator and cancel out, leaving 
fc on and fc ff ligand independent. The in vitro experi- 
ments by Morimatsu et al. appear to be done under 
equilibrium conditions so detailed balance violation is an 
unlikely explanation for the observed ligand dependence 

of T off . 

E. Effect of missed events 

A missed event is a short-lived binding that escapes 
the instrumental resolution because it cannot be distin- 
guished from the background noise or because the de- 
tector has an intrinsic dead time. Unaccounted missed 
events distort waiting time distributions and increase 
mean waiting times. This issue was studied extensively 
in the field of single ion channel recording a 19 ' 20 . Here, 
we show that missed events may cause the dependence 
of fc on and fc ff on [L]. 

Here we analyze the effect of missed events using the 
two-state model in Eq. pQ . Assume that the measurement 
has a fixed dead time a and that an event is missed if 
its waiting time is shorter than a. The apparent mean 
off-time, r ff, is given as (See Appendix): 

oo 

t oS = a+V^[fcr (T +T off (fc+I)] = Toff+PCTTon , (17) 

where the a accounts for the dead time skipped before the 
onset of the next detectable on-time interval. The effect 
of missed events on r on is ignorable for small ligand con- 
centrations that only induce less frequent binding. This 
condition is often satisfied in TIR-FM experiments be- 
cause binding events should be made rare enough to re- 
duce spatial crowding and the background noise and thus 
to allow detection in changes in the level of fluorescence 
signals. This technical requirement limits the concen- 
tration at the order of 10nM21. The association rate 
constant is obtained as 



T off [L] l/k + +p a T on [L] 

This result is mathematically equivalent to that from 
the single site protein with non-fluorescent interactions 
(Eq.[S]). The model can be derived from Template II 
without the transitions between "on" and "dark" states: 

[L] 

dark ^ off ^on , (19) 

[L] 

where the ligand binding from "off" to "dark" is used to 
model the missed events. 



We apply the above result to estimate the k on and 
the relative dead time cr/r on in the experiment by Mori- 
matsu et al*i. From the best fitting shown in Fig. [5] 
(dashed curve), we identify that k on = 1.15 nM _1 s _1 
and cr/r on = 2.19. The dead time a is more than 2 
times the mean on-time and the dissociation constant 
Kd is about 3 nM, suggesting that the affinity between 
the phosphorylated EGFR and Grb2 is somewhat overes- 
timated, compared to experimental measurements at 700 
nMS and 30nMi&. This result reflects the same struc- 
tural limitation by models from Template II, which did 
not generate a good fit to the measured data of k on . 

F. Effect of an external buffer 

Here we examine another possible mechanism in which 
an ambient buffer may sequester ligands (specifically or 
non-specifically) and consequently reduce the concentra- 
tion of free ligands available to bind the single molecule of 
interest. Unaccounted background buffering may cause 
ligand dependence of k on . In the absence of a buffer, 
the effective ligand concentration [L] that interacts with 
the single molecule equals the total ligand concentration 
[L]tot- Otherwise, ambient buffers may offset the effective 
ligand concentration available for binding. We consider 
a following simple buffering mechanism. 

Ka 

aL + B f± L B , (20) 

where a > 1 measures the (average) degree of binding co- 
operativity between the ligand and the buffer group. In 
the presence of the buffer, the free ligand concentration 
[L] is the effective concentration of ligands that inter- 
act with the single-molecule of interest. We note that 
the specified ambient buffer B can be a mixture of sev- 
eral kinds of molecules that may interact with the ligand 
pool. With a phenomenological equilibrium dissociation 
constant Kb and the total ligand concentration [L]tot> 
we have 

[L]°[B] = ^([L] tot -[L]). (21) 
a 

Considering that buffer B is in excess ([B] ^> [L]tot) so 
that any changes in ligand concentration due to bind- 
ing are insignificant, we have [L] a + j3([L] — [L]tot) = 0, 
where j3 — Kb/(cc[B]). If the majority of ligands are se- 
questered, ([L] <C [L]tot), we can approximate the effec- 
tive ligand concentration [L] ~ (/^[-Ljtot) 1 ^"- Using the 
two-state model of ligand-receptor binding (Eq.pQ), we 
obtain a fit (Fig. [2l solid curve) to the apparent associa- 
tion rate constant data from Morimatsu et ali by fc on = 
&+[L]/[L]tot, and obtained parameter values of a = 1.99 
and k+/3^ a = 2.03/xM- 1 /" s - 1 . The fitting results sug- 
gest that a strong cooperativity (a « 2) existed for Grb2 
binding to the buffer. Although the fitting did not di- 
rectly resolve k + and j3, we can make a crude estimation 
to w 1.8 x lO^fiM (i.e., K B /[B] = 3.6 x 1Q~ 6 ^M) 
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by assuming a diffusion-limited second-order association 
rate constant k + — 4-kDs = 1.51 x 10 3 /iM -1 s -1 (with 
aw 2). 

We note that although the above external-buffer model 
produced the closest agreement (partially due to the 
mathematical properties of the fitting function) to the 
data by Morimatsu et al.— in comparison to the previous 
ones (Fig. [5]), it remains unclear whether the experiment 
setup introduced a chemical or physical environment that 
might serve as ambient buffers for Grb2. 



IV. DISCUSSION 

Molecular binding is an essential biochemical interac- 
tion, which can now be probed at the single-molecule 
level with fluorescence techniques such as Forster (Flu- 
orescence) resonance energy transfer and TIR-FM 21 . 
These techniques unveil interaction details that are often 
unavailable in data obtained from ensemble-averaged ex- 
periments. Proper interpretation of the fluorescence time 
series for single molecule binding by its partner protein 
(or ligand) requires caution. Especially, phenomenolog- 
ical binding constants k on and fc Q ff as well as the dis- 
sociation constant Kd extracted from the fluorescence 
time series may change as the ligand concentration varies, 
which carries important information about the binding 
biochemistry and its experimental environment. Model- 
based analysis of the ligand-dependence of kinetic param- 
eters can help to uncover the underlying mechanisms. 

In this paper we explore influences by various mech- 
anistic and technical factors, specifically, single-site 
and multisite non-fluorescent binding, non-equilibrium 
steady-states, missed events, and ambient buffers, which 
could potentially introduce dependence of mean waiting 
times and thus apparent kinetic rate constants on lig- 
and concentration. A combination of these factors can 
further obscure the analysis of single-molecule kinetics, 
requiring assistance of appropriate kinetic models. 

We have shown that molecular conformation fluctua- 
tion (or dynamic disorder) alone does not cause concen- 
tration dependence under the condition of detailed bal- 
ance in models that reach equilibrium steady states as 
long as each ligand-induced state transition is experimen- 
tally resolved (Template I, Fig. [I}. In this case, kinetic 
rate constants inferred from mean waiting times reconcile 
with those measured by ensemble-averaged experiments. 

Unobserved ligand binding, due to unknown biochemi- 
cal reasons, are the essential sources of ligand dependence 
of the waiting times, which we analyzed using kinetic 
models that invoke non-fluorescent liganded states. Dif- 
ferent models generate different mathematical structures 
of ligand dependence. In general, a kinetic rate constant, 
k on or fc ff, is a rational function of ligand concentra- 
tion. Models with non-fluorescent liganded states for a 
molecule that has a single ligand binding site (Template 
II, Fig. [T]) predict that k on has an inverse linear relation- 
ship with ligand concentration [L] (Eq.[5]), whereas fc Q ff 



remains unmodulated by [L] . Models of a molecule with 
two ligand binding sites with one site non-fluorescent 
when bound to ligand (Template III, Fig. [T]) predict that 
both fc on and k g have sigmoidal shaped relationship with 
the ligand concentration. 

Unmonitored binding can also be caused by short tran- 
sitions called missed events whose time durations fall 
within the length of the dead time of the experimental 
instrument, which has a similar form of concentration de- 
pendence by the single site non-fluorescent binding mod- 
els (Template II, Fig. Q]). Our results coincide with a 
similar three-state model proposed by Crouzy and Sig- 
worth— to account for missed events in single-ion chan- 
nel recordings, in which transient transitions between a 
closed state to a short-lived state were used to capture 
events that was off the scope of the instrumental reso- 
lution. It was known in analysis of single-ion channel 
recordings that unaccounted missed events due to fixed 
dead time can distort the waiting time distributions and 
cause overestimation of waiting times. Such limitation 
may be carried over to cause ligand concentration depen- 
dence in single-protein fluorescent binding experiments. 

The aforementioned models were studied under the 
condition of detailed balance. Another source that likely 
causes ligand dependence is the violation of detailed bal- 
ance in model parameters, which can be studied us- 
ing non-equilibrium models. This mechanism is ignored 
in most studies. The typical assumption of a single- 
molecule analysis is that the system relaxed to its ther- 
modynamic equilibrium at the steady state. The equi- 
librium assumption is rather strong and requires the sys- 
tem to meet stringent conditions (the thermodynamics 
requires the system being isolated without energy and 
material exchange with its external environment), and it 
may not be always justified in particular for in vivo sys- 
tems that entail many energy-driven reactions 24 or for in 
vitro systems that are sustained by energy sources. De- 
tailed balance violation can be tested by analyzing time 
series data. For example, two-dimensional joint wait- 
ing time distributions that account for two consecutive 
events, waiting time for binding event followed by that of 
a dissociation event, can be used to test whether detailed 
balance holds by checking the time reversibility (also see 
Refi2£ for other methods). As a consequence of adopt- 
ing non-equilibrium model, model parameters might not 
be constrained by detailed balance. A non-equilibrium 
model achieves a steady state with net fluxes around re- 
action loops, which gives rise to ligand dependence of rate 
constants as rational functions of ligand concentration. 

We note that our analysis of the effect of detailed bal- 
ance is closely related to recent works that study the 
substrate-dependence of enzymatic turnover rate v in 
fluctuating enzymes with multiple conformation chan- 
nel s 26 ! 27 . Under the detailed balance condition, the de- 
pendence of production formation velocity on substrate 
concentration [S] maintains the classic Michaelis-Menten 
form, v — k[S]/ (Km + [S]), where effective catalytic rate 
constant k and apparent Michaelis-Menten constant K m 
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can be derived from kinetic parameters of the model for 
the enzyme system. When the condition of detailed bal- 
ance does not hold, v becomes in general a rational func- 
tion of [S]. To demonstrate that the results from our 
work can also be applied to analyze the turnover rate of 
multi-conformational enzymes, consider a general scheme 
of enzymatic network, where an enzyme fluctuates among 
several (m) conformations, forming parallel and intercon- 
nected catalytic channels. Through each channel, the en- 
zyme engages the substrate and then undergos multiple 
(n) reversible intermediate steps before finally converting 
the substrate into a product. The turnover rate can be 
expressed as the summation of turnover rates in all indi- 
vidual channels: v = Y^iL i Vmki, where r\ n i is the steady- 
state residence probability at the last (nth) substrate- 
bound step of the zth channel and fcj is the catalytic rate 
constant of that channel. We can write j] n i = rj n i/Z, 
where the unnormalizcd residence probability rj n i and the 
partition function Z — 5Z™ = i Eti Vji- As shown in Sec- 
tion lllBl under the detailed balance rj n i is proportional to 
the substrate concentration [S] and Z is a linear function 
of [S]. Thus, the conventional Michaelis-Menten form is 
preserved in v. Without detailed balance, the turnover 
rate v assumes a rational functional form of [S], which 
can be obtained systematically using the graphic method 
as shown in the Appendix C. 

Finally we studied the effect by an external buffer 
group that sequesters ligands, which if unaccounted could 
cause strong ligand dependence in rate constants. The 
extent of the buffering effect depends on biochemical na- 
ture of ligand-buffer interaction and the relative availabil- 
ity of the buffer group. It is natural to consider buffering 
in cellular environment of living cells where molecules 
are subject to ubiquitous binding reactions in a crowded 
molecular surrounding by specific and/or non-specific in- 
teractions. 

We applied our results to analyze the experiment data 
of labeled Grb2 binding to EGFR molecules by Mori- 
matsu et ali We examined the possibility that missed 
events due to transient binding (and showed that this 
is equivalent to Template II) were the source of the lig- 
and dependence of the apparent association rate constant 
and found that the best fit could not accurately repro- 
duce the data. The mathematically simplest and best 
fit resulted from assuming there were background Grb2 
buffers characterized by two parameters accounting for 
cooperativity and affinity. Non-equilibrium models with 
detailed balance violation were not applied to analyze 
the data because the in vitro experiments by Morimatsu 
et alii were apparently performed under the equilibrium 
condition. Elucidation of the most likely mechanism re- 
quires further experimental investigation. 
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APPENDIX 

A. Aggregated Markov model 

An aggregated Markov model of a single molecule ki- 
netics can be described by the following master equation: 



dP(t) 
dt 



P(t)Q , 



(Al) 



where entry pij in matrix P is the probability of being in 
state j at time t when the system was in state i at t = 
0. Matrix Q is called "generator matrix". For systems 
with aggregates "on" and "off" , Q can be organized and 
partitioned as 



Q 



Qoo Qo 
Qco Qc 



(A2) 



where diagonal blocks contain intra-aggregate transi- 
tion rates and off-diagonal blocks contain inter-aggregate 
transition rates. Letters o and c denote "on" and "off" 
aggregates, respectively. 

The on-time distribution is given 



(A3) 



where vector ir is the steady-state distribution of "on" 
aggregate entry probabilities over the "on" states, and 
it is given as the steady-state probability flux into indi- 
vidual "on" states from "off" states normalized by the 
total probability flux into the "on" aggregate: tt q — 
WcQco/ '(wcQcoUo). The mean on- waiting time is calcu- 
lated as: 



+ f mm w ° u ° 
tfon(t)dt = — 

W {°iocU>c 



The off-time r ff is similarly obtained. 



(A4) 



B. Violation of detailed balance in the three-state model 

Here we derive the mean "on" and "off" waiting times 
for the three-state model shown in Fig. 3 in the main 
text. If we arrange the states in the order as labeled in 
the figure, the generator matrix for this model is given 
by: 
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Q 



-k 12 -k 13 [L] fci2 k 13 [L] 

k 2 i -k2i-k 2 a[L] k 23 [L] 



(A5) 
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One can find the steady-state occupancy (the left null 
space of Q) as: 



1 

Z 



+ &21&32 + k 23 k 3 i[L] 

k 12 k 31 + k 12 k 32 + k 13 k 32 [L] 
(Ai3*2i + k 12 k 23 )[L] + k 13 k 23 [L} 2 



(A6) 



where Z is the partition function that normalizes w, i.e., 
Z = X)j=i w i- The magnitude of the net flux (regardless 
of the direction) around the reaction loop can be calcu- 
lated as: 

J„ = |u>ifci2 - w 2 k 21 \ = -3- \k 12 k 23 k 31 - k 21 k 13 k 32 \ . 

(A7) 

If the model satisfies detailed balance, then J n = and 
the model parameters obey the following constraint: 



k\ 2 k 23 k 3 i 
k\ 3 k 32 k 21 



1 



(A8) 



Thus, the equilibrium state probability can be reduced 
to: 



Z 



k 23 k 31 
ki 3 k 32 
k\ 3 k 23 [L] 



(A9) 



The mean "on" and "off" times can be calculated from 
Eq.[4] in the main text: 



1 



k 3 i 



Toff 



k 23 k 31 + k 13 k 32 



k\ 3 k 23 (k 3 i + k 32 )[L] 



(A10) 



According to fc on = l/r ff[L], k a n = l/ T on, both apparent 
association and dissociation rate constants k a and k Q e are 
ligand independent. This is consistent with the results 
we shown in the main text (Detailed balance). We note 
that in the three-state model the mean "off" time is also 
independent of the intra- aggregate transition rates k\ 2 
and k 2 \. 

When detailed balance does not hold in the model the 
mean "on" time remains unchanged while the mean "off" 
time will assume a form as follows: 



Toff 



(fci2 + k 21 )(k 3 i + k 32 ) + (k 23 k 31 + k 13 k 32 )[L) 



(k 3 i + k 32 )(ki 2 k 23 + k 13 (k 21 + k 23 [L]))[L] 



and the apparent association rate constant is: 



(All) 



h — 



(k 3 l + fc32)(fcl2fc23 + ki 3 (k 2 i + k 23 [L])) 



(fci2 + k 21 )(k 31 + k 32 ) + (k 23 k 3 i + ki 3 k 32 )[L] 

which has a ligand dependence similar in form to that by 
models from Template III (Eq.[12]) and may potentially 
achieve fitting to date with the same quality. In this spe- 
cific model, the mean "on" time is a constant and does 
not have a dependence on ligand concentration. In gen- 
eral, as we shown below that in non-equilibrium models 
both Ton and r ff are ligand dependent. 



C. Ligand dependence in a general scheme for single-site 
binding 

Obtaining an analytical solution to the steady-state 
probability distribution w for a general reaction scheme 
is unwieldy by directly finding the left null space of the 
generator matrix Q. As an alternative, one can obtain 
w by a known graph-theoretical approach used in non- 
equilibrium statistical mechanics^, which solves for the 
steady-state distribution for a non-equilibrium system, 
as we show below. Here, note that we only consider 
single-site fluorescent binding and assume that connec- 
tions between any two states consist of both forward and 
backward transitions. Below, we first introduce how to 
use the method to systematically obtain the steady-state 
probabilities, and then derive the general formula for the 
ligand dependence of rate constants. 

The method involves enumerating all distinct spanning 
trees of the topology of a given reaction scheme. A span- 
ning tree of a (undirected) graph is a tree with edges 
from the original graph that connects all the nodes from 
the graph. For a topology that has N nodes (states) , the 
maximum number of distinct spanning trees possible is 
N N ~ 2 for the fully connected topology with every pair 
of nodes directly connected. Fig. fJJ^ shows all distinct 
spanning trees for an example four-state model. 

For a state k, any given undirected spanning tree has a 
corresponding directed spanning tree s with all unidirec- 
tional edges (transitions) leading toward k (see Fig. 
for examples) . One can view state k as a root of the tree 
and any directed edge has a direction pointing from an 
offspring node toward the root. Let Vk s be the product 
of all transition rates associated with the edges in s. It 
is an established result that the steady-state probability 
for the system to reside in state k is given by2£: 



w k 



7 Vks ' 



(A13) 



where N s is the number of distinct spanning trees and 
Z = ^2^" Vks is the partition function for normaliza- 
tion purpose. We define the un-normalized steady-state 
probability vector as: 



w * Z 



(A14) 



which we will show has each entry as a polynomial func- 
tion of ligand concentration [L]. We consider an aggre- 
gated Markov model of a single molecule binding by a lig- 
and with two aggregates of states, liganded (fluorescent) 
aggregates and unliganded (non-fluorescent) aggregates. 

With the above preparation, we now can show that 
for the ith state of "on" aggregate in a given directed 
spanning tree s, Vi S is a monomial function of [L] with 
a form Vi S = ai S [L] Cs , where integer c s is the number of 
disjointed subtrees of c states in s. is the product of 
rate constants of the transitions in s. A spanning tree s 
partitions all c states into c s (1 < c s < N c ) disjointed c 
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reaction scheme 
= 



DCS 1 



spanning tree I spanning tree II 

o o 0^= 



o. 




spanning tree III spanning tree IV 
0^=0 



t 




■© o © 



DCS 2 DCS 3 



b To-©' '0-0' co 



\ XI 

-0_0j ,0,00s I 



DOS I ' 0—0—0 

FIG. 4. A. A four-state model with 4 distinct spanning trees 
(I, II, III and IV). Ligand-dependent transitions are labeled 
with [L]. Each of states 3 and 4 has a ligand bound. All 
links between two states contain a forward transition and a 
backward transition. In all spanning trees, transitions leading 
to state 4 are highlighted as thick arrows. B. An illustrative 
example of a spanning tree that shows 3 disjointed c subtrees 
(DCS, dashed boxes 1, 2 and 3) connecting to o states in 2 
disjointed o subtrees (DOS, dashed boxed, I and II) through 
gateway states (labeled with *'s). The spanning tree can be 
viewed as a hierarchical acyclic bipartite graph of DCSs and 
DOSs. The directed edges are shown as directed spanning 
trees that have root nodes in DOS I. In this spanning tree, 
the contribution to the unnormalized steady-state probability 
for an o state is proportional to [L] 3 , and is proportional to 
[L] 2 for a c state. 



subtrees (DCS) (see Fig.|3j3 for an example). Each DCS 
contains only connected c states forming a subnetwork. 
DCS's have no direct connections to each other but via 
some o state(s). Each DCS connects to the o aggregate 
through gateway c states that have direct links with some 
gateway o states. Similarly, o states in s form several dis- 
jointed o subtrees (DOS). For the ith state in the o aggre- 
gate, according to Eq. |A13| the corresponding directed 
spanning tree provides an additive contribution to the 
un- normalized steady-state probability w Qi , which con- 
sists one and only one term proportional to [L] by a gate- 
way transition from each DCS such that Vi s — ai S [L] Cs . 
The claim of only one [independent transition from a 
DCS leading to the ith o state is based on the observa- 
tion that if more than one such transitions exist there 
will be loop(s) in the spanning tree, which is an obvious 
contradiction. The result holds for any arbitrary o state 
in the spanning tree s. The result Vj s = /3j S [L] Cs_1 can 
be derived using similar arguments above for the jth c 
state. 



Summing up contributions from all distinct spanning 
trees for a topology, we then obtain the un-normalized 
steady-state probability for state i in the o aggregate and 
state j in the c aggregate: 



N s 



w 0t = \ a is [L] 



s=l 



s=l 



(A15) 



Therefore, the un-normalized steady-state "on" and "off" 
probabilities are given as: 



N N s 



N c N s 

j = l 8=1 



(A16) 

The steady-state inter-aggregate flux calculated using the 
unnormalized "on" probability is: 



N N B 



(A17) 



where 7^ is the ith entry in the vector Q oc u c . Thus, the 
mean "on" and "off' times are given as: 



Toff 



Poff.. S£iS^i^W c -- 1 



The apparent rate constants are: 

- 1 _ Ei="iE^=i7i<*«[£] c 



1 _ E^i Ef=i 7»M£] 



«off 



Et°lE^l Q is[ L l 



(A18) 



(A19) 



(A20) 



(A21) 



Therefore, both /c on and k Q g approach to constants at the 
limits of small and large ligand concentrations. At the in- 
termediate [L], each apparent rate constant is a rational 
function of [L] , whose nominator and denominator have a 
same structure of a polynomial. With detailed balance, a 
common [X]-dependent term factors from both the nom- 
inator and the denominator and the ligand-dependence 
cancels out. The apparent dissociation constant is given 
as the following rational function of [L\. 



g ff _ Y^j=i Y^=i Pjs[L] Cs 1 



(A22) 



D. Missed event 

Consider a system with a dead time a for detecting 
binding events. The probability to have a missed binding 
event is: 



Pa 



—e~ t/T °»dt = 1 - e- a/T °» 



(A23) 
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Let q a = 1 — p a . The mean dead time is calculated as: 

(A24) 



So T^-'/Wt qo 

= Ton - C — 



Pa 



Pa 



Assuming binding and dissociation events are indepen- 
dent, the probability that one misses k consecutive short 
events is p k a q a - The apparent mean off-time, r ff , is given 
as: 



oo 

Toff = cr + ^ q a p k a [kT a + T oS (k + 1)] = 

fe=0 



Toff + Pa Ton 



(A25) 

where the a accounts for the dead time skipped before 
the onset of the next detectable on-time interval. 

Similarly, the apparent mean on-time is given as T on = 
(Ton +P5Tas)/qs, where ps = 1 - e~ fe +[ L l 5 is the proba- 
bility that an off-time is shorter than the dead time S for 
detecting a dissociation event. When [L] is very small 
(i.e., ps ~ 0), it is unlikely that a waiting time for a dis- 
sociation event falls within the dead time S and therefore 
r on is not significantly affected by missed events. 
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